Error in loadNamespace(x): there is no package called 'EcoData'
Day 4 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
March 4, 2025
Schedule of today
Now - 14 (or 14.30 if you are enthusiastic still): Work on the data set(s)
14 (14.30) - 15: Short feedback round
15-16: Feedback, conclusion
Physicochemical properties of wine and quality judgements
Error in loadNamespace(x): there is no package called 'EcoData'
dplyrdplyr::mutate and as.factor() to tranform the columnjanitor::clean_names() functionMost important variables:
| variable | class | description |
|---|---|---|
| gender | character | Binary gender |
| event | character | Event name |
| medal | character | Medal type |
| athlete | character | Athlete name (LAST NAME first name |
| abb | character | Country abbreviation |
| country | character | Country name |
| type | character | Type of sport |
| year | double | year of games |
Get the data:
dplyr!is.na(medal))Atlantic marsh fiddler crab (Minuca pugnax)
date latitude site size air_temp air_temp_sd water_temp water_temp_sd
1 2016-07-24 30 GTM 12.43 21.792 6.391 24.502 6.121
2 2016-07-24 30 GTM 14.18 21.792 6.391 24.502 6.121
3 2016-07-24 30 GTM 14.52 21.792 6.391 24.502 6.121
4 2016-07-24 30 GTM 12.94 21.792 6.391 24.502 6.121
5 2016-07-24 30 GTM 12.45 21.792 6.391 24.502 6.121
6 2016-07-24 30 GTM 12.99 21.792 6.391 24.502 6.121
name
1 Guana Tolomoto Matanzas NERR
2 Guana Tolomoto Matanzas NERR
3 Guana Tolomoto Matanzas NERR
4 Guana Tolomoto Matanzas NERR
5 Guana Tolomoto Matanzas NERR
6 Guana Tolomoto Matanzas NERR
Ideas - known methods
Temperature and ice duration on lakes since 19th century
Ice data:
lakeid ice_on ice_off ice_duration year
1 Lake Mendota <NA> 1853-04-05 NA 1852
2 Lake Mendota 1853-12-27 <NA> NA 1853
3 Lake Mendota 1855-12-18 1856-04-14 118 1855
4 Lake Mendota 1856-12-06 1857-05-06 151 1856
5 Lake Mendota 1857-11-25 1858-03-26 121 1857
6 Lake Mendota 1858-12-08 1859-03-14 96 1858
Temperature data:
sampledate year ave_air_temp_adjusted
1 1870-06-05 1870 20.0
2 1870-06-06 1870 18.3
3 1870-06-07 1870 17.5
4 1870-06-09 1870 13.3
5 1870-06-10 1870 13.9
6 1870-06-11 1870 15.0
Ideas - known methods
dplyr::left_join to combined the tables with annual mean temperature and ice duration
dplyr session or look at the helpData from FU et al. 2015, Nature Cell Biology
Data found via Tutorial on heat maps using this data
3 csv files:
heatmap_genes.csv: A list of the names of interesting genes to look at (Genes used in Figure 6b in paper)DE_results.csv: Gene expression in luminal cells in pregnant versus lactating mice
normalized_counts: Normalized counts for genes for the different samplesData cleaning:
janitor::clean_names function to make the column headers nicerDE_results and normalized_counts by their shared columnsselect to remove columns you don’t need for analysis to get a better overviewp_value < 0.01 & abs(logFC) > 0.58)Data analysis:
pheatmap::pheatmap()
pheatmap takes a matrix as input (use as_matrix on tibble to transform)scale function
pheatmap can scale but with ggplot you have to scale before plottingcorrplot package for correlation plotsfactoextra package for PCA visualizationNA values: use tidyr::drop_na() to remove all NA values from the data firstWorking with real research data
Meet in your group (if you want)
Work on your data set
Take breaks as you need and be back at 2 p.m.
Keep an eye on your group and the general chat
In 1-2 mins:
What was the highlight of your analysis?
What was difficult?
If you want: Share a screenshot in the chat or share your screen
Please take 10 mins to complete the feedback survey for the Graduate center (don’t use Internet Explorer)
We learned a lot of stuff!
Selina Baldauf // Bring your own data